Introduction Clinical risk assessment scores can help risk-stratify patients for cancer-associated thrombosis (CAT); however, their performance in individual cancer subtype remains suboptimal. While previous studies have explored the utility of single plasma biomarkers such as D-dimer or P-selectin, we lack a validated, non-invasive biomarker signature panel. Our current study evaluates a plasma proteomic assay and machine learning model for predicting CAT.

Methods We performed a nested case control study from 1,694 patients with newly diagnosed solid tumor malignancy who had plasma sample collection from 2011-2023 at the Harris Health System. Patients with early-stage disease, samples taken before cancer diagnosis or after chemotherapy initiation, or inadequate follow-up were excluded. Five common cancer types were selected (breast, colorectal, lung, pancreatic, gastroesophageal) to form an eligible cohort of 312 patients. Incidence density sampling was used with 1:2 matching on cancer type, stage, and treatment to form a final analytic cohort of 170 patients (57 CAT vs 113 controls).

We performed proteomic profiling using the Olink Explore HT, which uses the proximity extension assay to measure plasma proteins as normalized protein expression (NPX) values. Random Survival Forest (RSF) model was used to predict time to CAT, where NPX values (5,416 proteins) and clinical variables (age, sex, race, ethnicity, cancer type, stage, diagnosis year) were trained in 100 bootstrapped iterations with resample. Internal validation with time dependent receiver operating characteristic (TD-ROC) was assessed at days 30, 90, and 180, using out-of-bag (OOB) test samples. Top proteins identified by nonparametric permutation variable importance (VIMP) and stably present in multiple iterations were retained for pathway enrichment analysis using the Reactome Over-Representation Analysis with p-value <0.05 and FDR q-value <0.1.

Results Among 170 patients, the median age was 54 (IQR 47-60), 64% were female. Race and ethnicity included 62% Hispanic, 12% Non-Hispanic (NH) White, 18% NH Black, and 7% NH Asian. There were 42 breast, 51 colorectal, 35 lung, 22 pancreatic, and 20 gastroesophageal cancers; 53% were metastatic; and all received chemotherapy after sample collection. The 57 CAT events included 23 pulmonary embolism, 12 lower extremity deep vein thrombosis (DVT), 18 upper extremity DVT, and 4 splanchnic vein thrombosis. Median time to CAT was 154 days (IQR 70-262). There were 7, 19, and 31 events by 30, 90, and 180 days, respectively.

Among 5,416 proteins, the NPX values ranged from -11.43 to +12.40 (IQR -0.50 to +0.52). There were no significant differences by batch or storage time. For internal validation, the RSF model achieved a bootstrapped mean TD-ROC of 0.82 (95% CI 0.54-0.98), 0.71 (95% CI 0.57-0.88), and 0.67 (95% CI 0.56-0.79) at 30, 90, and 180 days, respectively. Approximately 35% were classified as high-risk. In the OOB test sets, the observed average VTE incidence in the high-risk group were 13%, 21%, and 26% at 30d, 90d, and 180d, respectively. In comparison, the observed average VTE incidence in the low-risk group was 1%, 8%, and 15% at the same time points. Top important proteins included KANK1, F9, CABP4, PRND, and GFPT2, among others. In pathway enrichment analyses, the top 20% proteomic features were over-represented in neutrophil degranulation, extracellular matrix organization, platelet cytosolic calcium and degranulation, integrin cell surface interactions, regulation of complement cascade, and formation of fibrin clot.

Conclusion Individual cancer patients have unique prothrombotic expression profiles in the plasma that are prognostic for future CAT occurrence. As demonstrated in the pathway enrichment analyses from the top RSF features, the relationship between coagulation cascade, neutrophil, platelet, complement, and extracellular matrix is complex and non-linear. A survival tree-based machine learning model incorporating all plasma proteomic signatures can predict short-term CAT occurrence with higher accuracy than long-term (0.82 vs 0.67). Instead of single target assays, panel-based plasma proteomic assays may complement existing clinical risk scores and further differentiate the thrombotic risk profiles among individual cancer subtypes. External and prospective cohort validations are needed to ensure reproducibility and generalizability.

This content is only available as a PDF.
Sign in via your Institution